Plan for this session:
The network we are going to build is based on a relationship studied in this paper:
In that paper, the researcher builds a matrix of relationships like this:
The data was not available from the author’s website, so the matrix you see above was copied and pasted to Excel:
# opening excel
library(rio)
linkAdjMx='https://github.com/EvansDataScience/CTforGA_Networks/raw/main/dataFigueroa.xlsx'
adjacency=import(linkAdjMx,which = 1)
This data is organized as an adjacency matrix. It should be squared:
dim(adjacency)
## [1] 37 38
Let’s take a look:
head(adjacency)
## Names Romero Grana Miro Quesada Moreyra Fort De La Puente Wiese
## 1 Romero 0 1 1 1 1 1 0
## 2 Grana 1 0 1 0 1 1 1
## 3 Miro Quesada 1 1 0 0 1 1 1
## 4 Moreyra 1 0 0 0 1 1 1
## 5 Fort 1 1 1 1 0 1 0
## 6 De La Puente 1 1 1 1 1 0 1
## Onrubia Brescia Nicolini Montero Picaso Bentin Benavides Bustamante
## 1 1 1 1 0 0 1 1 1
## 2 0 0 0 1 0 0 1 1
## 3 0 0 0 1 0 0 1 1
## 4 1 1 0 1 1 1 0 1
## 5 1 1 1 0 1 1 1 1
## 6 0 0 0 1 0 0 1 1
## Woodman Pollit Raffo Piazza Berckemeyer Llosa Barber Beoutis Ledesma
## 1 1 1 1 1 1 0
## 2 0 0 1 0 0 1
## 3 0 0 1 0 0 1
## 4 0 1 0 1 0 0
## 5 1 1 1 0 1 1
## 6 0 0 1 1 0 1
## Rizo Patron Montori Sotomayor Cilloniz Ferreyros Michell Wong Lu
## 1 1 1 0 0 0 0 0
## 2 0 0 0 0 0 1 0
## 3 0 0 0 0 0 1 0
## 4 1 1 1 0 0 0 0
## 5 0 0 1 1 0 0 0
## 6 1 0 1 0 0 0 0
## Batievsky Spack Matos Escalada Galsky Lucioni Rodriguez Rodriguez Custer
## 1 0 0 0 0 0 0
## 2 0 0 0 0 0 0
## 3 0 0 0 0 0 0
## 4 0 0 0 0 0 0
## 5 0 0 0 0 0 0
## 6 0 0 0 0 0 0
## Ikeda Cogorno Arias Davila
## 1 0 0 0
## 2 0 0 0
## 3 0 0 0
## 4 0 0 0
## 5 0 0 0
## 6 0 0 0
Let’s move the column Names as the row names, then we will get an squared matrix:
row.names(adjacency)=adjacency$Names
adjacency$Names=NULL
# then
head(adjacency)
## Romero Grana Miro Quesada Moreyra Fort De La Puente Wiese Onrubia
## Romero 0 1 1 1 1 1 0 1
## Grana 1 0 1 0 1 1 1 0
## Miro Quesada 1 1 0 0 1 1 1 0
## Moreyra 1 0 0 0 1 1 1 1
## Fort 1 1 1 1 0 1 0 1
## De La Puente 1 1 1 1 1 0 1 0
## Brescia Nicolini Montero Picaso Bentin Benavides Bustamante
## Romero 1 1 0 0 1 1 1
## Grana 0 0 1 0 0 1 1
## Miro Quesada 0 0 1 0 0 1 1
## Moreyra 1 0 1 1 1 0 1
## Fort 1 1 0 1 1 1 1
## De La Puente 0 0 1 0 0 1 1
## Woodman Pollit Raffo Piazza Berckemeyer Llosa Barber
## Romero 1 1 1 1 1
## Grana 0 0 1 0 0
## Miro Quesada 0 0 1 0 0
## Moreyra 0 1 0 1 0
## Fort 1 1 1 0 1
## De La Puente 0 0 1 1 0
## Beoutis Ledesma Rizo Patron Montori Sotomayor Cilloniz Ferreyros
## Romero 0 1 1 0 0 0
## Grana 1 0 0 0 0 0
## Miro Quesada 1 0 0 0 0 0
## Moreyra 0 1 1 1 0 0
## Fort 1 0 0 1 1 0
## De La Puente 1 1 0 1 0 0
## Michell Wong Lu Batievsky Spack Matos Escalada Galsky Lucioni
## Romero 0 0 0 0 0 0
## Grana 1 0 0 0 0 0
## Miro Quesada 1 0 0 0 0 0
## Moreyra 0 0 0 0 0 0
## Fort 0 0 0 0 0 0
## De La Puente 0 0 0 0 0 0
## Rodriguez Rodriguez Custer Ikeda Cogorno Arias Davila
## Romero 0 0 0 0 0
## Grana 0 0 0 0 0
## Miro Quesada 0 0 0 0 0
## Moreyra 0 0 0 0 0
## Fort 0 0 0 0 0
## De La Puente 0 0 0 0 0
This matrix is saved as a data frame has now to be converted into a matrix.
adjacency=as.matrix(adjacency) # This coerces the object into a matrix, just in case
From this kind of structure (the adjacency matrix), we can easily create a network via Igraph:
library(igraph)
EliteNet=graph.adjacency(adjacency,mode="undirected",weighted=NULL)
# see it here
EliteNet
## IGRAPH a57fd12 UN-- 37 135 --
## + attr: name (v/c)
## + edges from a57fd12 (vertex names):
## [1] Romero--Grana Romero--Miro Quesada Romero--Moreyra
## [4] Romero--Fort Romero--De La Puente Romero--Onrubia
## [7] Romero--Brescia Romero--Nicolini Romero--Bentin
## [10] Romero--Benavides Romero--Bustamante Romero--Woodman Pollit
## [13] Romero--Raffo Romero--Piazza Romero--Berckemeyer
## [16] Romero--Llosa Barber Romero--Rizo Patron Romero--Montori
## [19] Grana --Miro Quesada Grana --Fort Grana --De La Puente
## [22] Grana --Wiese Grana --Montero Grana --Benavides
## + ... omitted several edges
A network is composed of nodes (aka vertices) and edges that connect them. You can know how many you have of each like this:
vcount(EliteNet) #count of nodes
## [1] 37
ecount(EliteNet) #count of edges
## [1] 135
You can take a look at how this network looks like:
plot.igraph(EliteNet,
vertex.color = 'yellow',
edge.color='lightblue')
So far we only have nodes and their links. Let’s bring som information about the nodes:
# The adjacency matrix did not include the nodes attributes.
attributes=import(linkAdjMx,which = 2)
head(attributes)
## Nodes multinational
## 1 Romero 1
## 2 Grana 1
## 3 Miro Quesada 1
## 4 Moreyra 1
## 5 Fort 1
## 6 De La Puente 1
Igraph can add an attribute easily. Let’s proceed with the change:
EliteNet=set_vertex_attr(EliteNet,"multi",value=attributes$multinational)
#then
EliteNet
## IGRAPH a57fd12 UN-- 37 135 --
## + attr: name (v/c), multi (v/n)
## + edges from a57fd12 (vertex names):
## [1] Romero--Grana Romero--Miro Quesada Romero--Moreyra
## [4] Romero--Fort Romero--De La Puente Romero--Onrubia
## [7] Romero--Brescia Romero--Nicolini Romero--Bentin
## [10] Romero--Benavides Romero--Bustamante Romero--Woodman Pollit
## [13] Romero--Raffo Romero--Piazza Romero--Berckemeyer
## [16] Romero--Llosa Barber Romero--Rizo Patron Romero--Montori
## [19] Grana --Miro Quesada Grana --Fort Grana --De La Puente
## [22] Grana --Wiese Grana --Montero Grana --Benavides
## + ... omitted several edges
It should have worked:
vertex_attr_names(EliteNet)
## [1] "name" "multi"
Before going further, it is good to know if our network is connected:
is_connected(EliteNet)
## [1] FALSE
So we have these people in components, how many?
components(EliteNet)$no
## [1] 8
What nodes are in each component?:
groups(components(EliteNet))
## $`1`
## [1] "Romero" "Grana" "Miro Quesada" "Moreyra"
## [5] "Fort" "De La Puente" "Wiese" "Onrubia"
## [9] "Brescia" "Nicolini" "Montero" "Picaso"
## [13] "Bentin" "Benavides" "Bustamante" "Woodman Pollit"
## [17] "Raffo" "Piazza" "Berckemeyer" "Llosa Barber"
## [21] "Beoutis Ledesma" "Rizo Patron" "Montori" "Sotomayor"
## [25] "Cilloniz" "Ferreyros" "Michell" "Wong Lu"
##
## $`2`
## [1] "Batievsky Spack" "Matos Escalada" "Galsky"
##
## $`3`
## [1] "Lucioni"
##
## $`4`
## [1] "Rodriguez Rodriguez"
##
## $`5`
## [1] "Custer"
##
## $`6`
## [1] "Ikeda"
##
## $`7`
## [1] "Cogorno"
##
## $`8`
## [1] "Arias Davila"
Let me add the component as an attribute:
component=components(EliteNet)$membership
EliteNet=set_vertex_attr(EliteNet,"component",value=component)
#then
EliteNet
## IGRAPH a57fd12 UN-- 37 135 --
## + attr: name (v/c), multi (v/n), component (v/n)
## + edges from a57fd12 (vertex names):
## [1] Romero--Grana Romero--Miro Quesada Romero--Moreyra
## [4] Romero--Fort Romero--De La Puente Romero--Onrubia
## [7] Romero--Brescia Romero--Nicolini Romero--Bentin
## [10] Romero--Benavides Romero--Bustamante Romero--Woodman Pollit
## [13] Romero--Raffo Romero--Piazza Romero--Berckemeyer
## [16] Romero--Llosa Barber Romero--Rizo Patron Romero--Montori
## [19] Grana --Miro Quesada Grana --Fort Grana --De La Puente
## [22] Grana --Wiese Grana --Montero Grana --Benavides
## + ... omitted several edges
A visual representation follows:
Labels=component
numberOfClasses = length(unique(Labels))
#preparing color
library(RColorBrewer)
colorForScale='Set2'
colors = brewer.pal(numberOfClasses, colorForScale)
# plotting
plot.igraph(EliteNet,
vertex.color = colors[Labels],
edge.color='lightblue')
As we do not have ONE connected network but several components, we can pay attention to the Giant Component (component with max nodes), follow these steps:
(Sizes=components(EliteNet)$csize)
## [1] 28 3 1 1 1 1 1 1
# this is a subnet
EliteNet_giant=induced.subgraph(EliteNet, which(Labels == which.max(Sizes)))
Let’s take a look at the Giant Component:
plot.igraph(EliteNet_giant)
Basic summary:
summary(EliteNet_giant)
## IGRAPH 2f79726 UN-- 28 133 --
## + attr: name (v/c), multi (v/n), component (v/n)
We will use the giant component as our network to be explored. ____
graph.density(EliteNet_giant)
## [1] 0.3518519
diameter(EliteNet_giant)
## [1] 4
# we need some help beyond Igraph:
transitivity(EliteNet_giant,type = 'average')
## [1] 0.6537019
average.path.length(EliteNet_giant)
## [1] 1.740741
Random networks have small shortest path and small clustering coefficient…Is this the case?. The high clustering coefficient would suggest a small world, as most nodes are not neighbors of one another, but most nodes can be reached from every other in few steps.
transitivity(EliteNet_giant)
## [1] 0.5829694
assortativity_degree(EliteNet_giant)
## [1] -0.1208671
You can also compute assortativity using an attribute of interest:
attrNet=V(EliteNet_giant)$multi
assortativity(EliteNet_giant,attrNet)
## [1] -0.07258065
Coloring by attribute:
LabelsColor=attrNet+1
colors=c('lightblue','magenta')
plot.igraph(EliteNet_giant,
vertex.color = colors[LabelsColor])
A clique can be understood a community of nodes where all of them are connected to one another.
length(cliques(EliteNet_giant))
## [1] 1074
If a clique in the network can not be bigger is you add another node, then you have a maximal clique.
# How many cliques
count_max_cliques(EliteNet_giant)
## [1] 28
You can find the size of the maximum cliques:
clique_num(EliteNet_giant)
## [1] 8
You can see each maximum clique like this:
max_cliques(EliteNet_giant,min=8)
## [[1]]
## + 8/28 vertices, named, from 2f79726:
## [1] Onrubia Romero Raffo Bentin Fort
## [6] Llosa Barber Woodman Pollit Nicolini
##
## [[2]]
## + 8/28 vertices, named, from 2f79726:
## [1] Onrubia Romero Raffo Bentin Berckemeyer Montori
## [7] Brescia Moreyra
##
## [[3]]
## + 8/28 vertices, named, from 2f79726:
## [1] Benavides Romero Piazza Bustamante De La Puente
## [6] Fort Miro Quesada Grana
If a network presents cliques makes you suspect that there can be communities.
This is a huge field of research, let me me show you one of the algorithms known as the Louvain method.
communities=cluster_louvain(EliteNet_giant)
(partition=membership(communities))
## Romero Grana Miro Quesada Moreyra Fort
## 1 2 2 3 2
## De La Puente Wiese Onrubia Brescia Nicolini
## 2 3 1 1 1
## Montero Picaso Bentin Benavides Bustamante
## 2 3 1 2 2
## Woodman Pollit Raffo Piazza Berckemeyer Llosa Barber
## 1 1 2 1 1
## Beoutis Ledesma Rizo Patron Montori Sotomayor Cilloniz
## 2 3 1 3 3
## Ferreyros Michell Wong Lu
## 2 2 1
Now, use those values to make a plot to highlight the communities:
Labels=partition
numberOfClasses = length(unique(Labels))
library(RColorBrewer)
colorForScale='Set2'
colors = brewer.pal(numberOfClasses, colorForScale)
plot.igraph(EliteNet_giant,
vertex.color = colors[Labels],
edge.color='lightblue')
Let’s turn our attention to the nodes and their roles in the network.
rounding=3
degr=round(degree(EliteNet_giant,,normalized=T),rounding)
close=round(closeness(EliteNet_giant,,normalized=T),rounding)
betw=round(betweenness(EliteNet_giant,,normalized=T),rounding)
DFCentrality=as.data.frame(cbind(degr,close,betw),stringsAsFactors = F)
names(DFCentrality)=c('Degree','Closeness','Betweenness')
DFCentrality$person=row.names(DFCentrality)
row.names(DFCentrality)=NULL
head(DFCentrality)
## Degree Closeness Betweenness person
## 1 0.667 0.750 0.102 Romero
## 2 0.407 0.614 0.043 Grana
## 3 0.407 0.614 0.043 Miro Quesada
## 4 0.556 0.675 0.066 Moreyra
## 5 0.704 0.771 0.155 Fort
## 6 0.519 0.659 0.039 De La Puente
library(ggplot2)
ggplot(DFCentrality, aes(x=Betweenness, y=Closeness)) + theme_classic()+
scale_size(range = c(1, 25)) + geom_text(aes(label=person,color=Degree)) +
scale_colour_gradient(low = "orange", high = "black")
The node with the highest degree could be considered a hub in the network:
DFCentrality[which.max(DFCentrality$Degree),]
## Degree Closeness Betweenness person
## 5 0.704 0.771 0.155 Fort
We can plot the neighbors of the hub, its ego network:
#who
hub=DFCentrality[which.max(DFCentrality$Degree),]$person
#where (a character to numeric)
hubix=as.numeric(row.names(DFCentrality[which.max(DFCentrality$Degree),]))
HubEgonets=make_ego_graph(EliteNet_giant, nodes=hubix)
# HubEgonets is a list, get the first one:
HubEgonet=HubEgonets[[1]]
egoSizes=rep(5,vcount(HubEgonet)) # sizes '5' for every node
egoSizes[hubix]=40 # size '40' for this one
V(HubEgonet)$size=egoSizes # saving sizes
plot.igraph(HubEgonet,
vertex.color = 'yellow',
edge.color='lightblue')
Can this network be disconnected? If so, we can compute the minimum number of nodes that must be removed to disconnect the network (create at least two components):
vertex_connectivity(EliteNet_giant)
## [1] 1
Who is the sole node with the power to break the network?
(cut=articulation_points(EliteNet_giant))
## + 1/28 vertex, named, from 2f79726:
## [1] Bentin
We can highlight the articulation node in the giant component:
cutix=which(V(EliteNet_giant)==cut)
allSizes=rep(10,vcount(EliteNet_giant))
allSizes[cutix]=40
V(EliteNet_giant)$size=allSizes # saving sizes
plot.igraph(EliteNet_giant,
vertex.color = 'yellow',
edge.color='lightblue',vertex.shape='sphere')